AlphaFold's AI protein-structure predictions have limits | Science News

2022-09-23 21:24:06 By : Mr. George Zhang

AlphaFold, a deep-learning artificial intelligence system, predicted the structure of the estrogen receptor protein, seen in this illustration binding to DNA (purple). The predicted protein has some parts folded into precise structures (pink) and other areas that resemble free-flowing spaghetti (yellow).

VERONICA FALCONIERI HAYS/SCIENCE SOURCE

As people around the world marveled in July at the most detailed pictures of the cosmos snapped by the James Webb Space Telescope, biologists got their first glimpses of a different set of images — ones that could help revolutionize life sciences research.

The images are the predicted 3-D shapes of more than 200 million proteins, rendered by an artificial intelligence system called AlphaFold. “You can think of it as covering the entire protein universe,” said Demis Hassabis at a July 26 news briefing. Hassabis is cofounder and CEO of DeepMind, the London-based company that created the system. Combining several deep-learning techniques, the computer program is trained to predict protein shapes by recognizing patterns in structures that have already been solved through decades of experimental work using electron microscopes and other methods.

Headlines and summaries of the latest Science News articles, delivered to your inbox

Thank you for signing up!

There was a problem signing you up.

The AI’s first splash came in 2021, with predictions for 350,000 protein structures — including almost all known human proteins. DeepMind partnered with the European Bioinformatics Institute of the European Molecular Biology Laboratory to make the structures available in a public database.

July’s massive new release expanded the library to “almost every organism on the planet that has had its genome sequenced,” Hassabis said. “You can look up a 3-D structure of a protein almost as easily as doing a key word Google search.”

These are predictions, not actual structures. Yet researchers have used some of the 2021 predictions to develop potential new malaria vaccines, improve understanding of Parkinson’s disease, work out how to protect honeybee health, gain insight into human evolution and more. DeepMind has also focused AlphaFold on neglected tropical diseases, including Chagas disease and leishmaniasis, which can be debilitating or lethal if left untreated.

Decades of slow-going experiments have revealed the structure of more than 194,000 proteins, all housed in the Protein Data Bank. In 2021, the AlphaFold project released predicted structures for about 1 million proteins, including almost all known human proteins. This year, the AlphaFold database exploded with predicted structures for more than 200 million proteins.

The release of the vast dataset was greeted with excitement by many scientists. But others worry that researchers will take the predicted structures as the true shapes of proteins. There are still things AlphaFold can’t do — and wasn’t designed to do — that need to be tackled before the protein cosmos completely comes into focus.

Having the new catalog open to everyone is “a huge benefit,” says Julie Forman-Kay, a protein biophysicist at the Hospital for Sick Children and the University of Toronto. In many cases, AlphaFold and RoseTTAFold, another AI researchers are excited about, predict shapes that match up well with protein profiles from experiments. But, she cautions, “it’s not that way across the board.”

Predictions are more accurate for some proteins than for others. Erroneous predictions could leave some scientists thinking they understand how a protein works when really, they don’t. Painstaking experiments remain crucial to understanding how proteins fold, Forman-Kay says. “There’s this sense now that people don’t have to do experimental structure determination, which is not true.”

This plant protein is a kinase, which tacks phosphates onto other molecules, potentially changing their functions.

Proteins start out as long chains of amino acids and fold into a host of curlicues and other 3-D shapes. Some resemble the tight corkscrew ringlets of a 1980s perm or the pleats of an accordion. Others could be mistaken for a child’s spiraling scribbles.

A protein’s architecture is more than just aesthetics; it can determine how that protein functions. For instance, proteins called enzymes need a pocket where they can capture small molecules and carry out chemical reactions. And proteins that work in a protein complex, two or more proteins interacting like parts of a machine, need the right shapes to snap into formation with their partners.

Knowing the folds, coils and loops of a protein’s shape may help scientists decipher how, for example, a mutation alters that shape to cause disease. That knowledge could also help researchers make better vaccines and drugs.

For years, scientists have bombarded protein crystals with X-rays, flash frozen cells and examined them under high­powered electron microscopes, and used other methods to discover the secrets of protein shapes. Such experimental methods take “a lot of personnel time, a lot of effort and a lot of money. So it’s been slow,” says Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute investigator at the David Geffen School of Medicine at UCLA.

Leads to frost damage on plants by triggering ice crystals at relatively high temperatures. It might be used for seeding clouds and food preservation.

Such meticulous and expensive experimental work has uncovered the 3-D structures of more than 194,000 proteins, their data files stored in the Protein Data Bank, supported by a consortium of research organizations. But the accelerating pace at which geneticists are deciphering the DNA instructions for making proteins has far outstripped structural biologists’ ability to keep up, says systems biologist Nazim Bouatta of Harvard Medical School. “The question for structural biologists was, how do we close the gap?” he says.

For many researchers, the dream has been to have computer programs that could examine the DNA of a gene and predict how the protein it encodes would fold into a 3-D shape.

Over many decades, scientists made progress toward that AI goal. But “until two years ago, we were really a long way from anything like a good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus.

Moult is one of the organizers of a competition: the Critical Assessment of protein Structure Prediction, or CASP. Organizers give competitors a set of proteins for their algorithms to fold and compare the machines’ predictions against experimentally determined structures. Most AIs failed to get close to the actual shapes of the proteins.

“Structure doesn’t tell you everything about how a protein works.”

Then in 2020, AlphaFold showed up in a big way, predicting the structures of 90 percent of test proteins with high accuracy, including two-thirds with accuracy rivaling experimental methods.

Deciphering the structure of single proteins had been the core of the CASP competition since its inception in 1994. With AlphaFold’s performance, “suddenly, that was essentially done,” Moult says.

Since AlphaFold’s 2021 release, more than half a million scientists have accessed its database, Hassabis said in the news briefing. Some researchers, for example, have used AlphaFold’s predictions to help them get closer to completing a massive biological puzzle: the nuclear pore complex. Nuclear pores are key portals that allow molecules in and out of cell nuclei. Without the pores, cells wouldn’t work properly. Each pore is huge, relatively speaking, composed of about 1,000 pieces of 30 or so different proteins. Researchers had previously managed to place about 30 percent of the pieces in the puzzle.

Researchers previously solved about 30 percent of the 1,000-piece puzzle that is the nuclear pore complex. AlphaFold helped make sense of experimental data to complete 60 percent of the structure.

That puzzle is now almost 60 percent complete, after combining AlphaFold predictions with experimental techniques to understand how the pieces fit together, researchers reported in the June 10 Science.

Now that AlphaFold has pretty much solved how to fold single proteins, this year CASP organizers are asking teams to work on the next challenges: Predict the structures of RNA molecules and model how proteins interact with each other and with other molecules.

For those sorts of tasks, Moult says, deep-learning AI methods “look promising but have not yet delivered the goods.”

Being able to model protein interactions would be a big advantage because most proteins don’t operate in isolation. They work with other proteins or other molecules in cells. But AlphaFold’s accuracy at predicting how the shapes of two proteins might change when the proteins interact are “nowhere near” that of its spot-on projections for a slew of single proteins, says Forman-Kay, the University of Toronto protein biophysicist. That’s something AlphaFold’s creators acknowledge too.

The AI trained to fold proteins by examining the contours of known structures. And many fewer multiprotein complexes than single proteins have been solved experimentally.

Allows the parasite’s male and female gametes to fuse. It is being developed as a potential vaccine.

Forman-Kay studies proteins that refuse to be confined to any particular shape. These intrinsically disordered proteins are typically as floppy as wet noodles (SN: 2/9/13, p. 26). Some will fold into defined forms when they interact with other proteins or molecules. And they can fold into new shapes when paired with different proteins or molecules to do various jobs.

AlphaFold’s predicted shapes reach a high confidence level for about 60 percent of wiggly proteins that Forman-Kay and colleagues examined, the team reported in a preliminary study posted in February at bioRxiv.org. Often the program depicts the shapeshifters as long corkscrews called alpha helices.

Forman-Kay’s group compared AlphaFold’s predictions for three disordered proteins with experimental data. The structure that the AI assigned to a protein called alpha-synuclein resembles the shape that the protein takes when it interacts with lipids, the team found. But that’s not the way the protein looks all the time.

For another protein, called eukaryotic translation initiation factor 4E-binding protein 2, AlphaFold predicted a mishmash of the protein’s two shapes when working with two different partners. That Frankenstein structure, which doesn’t exist in actual organisms, could mislead researchers about how the protein works, Forman-Kay and colleagues say.

Helps control production of other proteins and may be involved in learning and memory. Despite AlphaFold’s high confidence (blue) in its predictions of the lower coiled area and the ribbon structure just above it, the two would never appear at the same time.

AlphaFold may also be a little too rigid in its predictions. A static “structure doesn’t tell you everything about how a protein works,” says Jane Dyson, a structural biologist at the Scripps Research Institute in La Jolla, Calif. Even single proteins with generally well-defined structures aren’t frozen in space. Enzymes, for example, undergo small shape changes when shepherding chemical reactions.

If you ask AlphaFold to predict the structure of an enzyme, it will show a fixed image that may closely resemble what scientists have determined by X-ray crystallography, Dyson says. “But [it will] not show you any of the subtleties that are changing as the different partners” interact with the enzyme.

“The dynamics are what Mr. AlphaFold can’t give you,” Dyson says.

The computer renderings do give biologists a head start on solving problems such as how a drug might interact with a protein. But scientists should remember one thing: “These are models,” not experimentally deciphered structures, says Gonen, at UCLA.

He uses AlphaFold’s protein predictions to help make sense of experimental data, but he worries that researchers will accept the AI’s predictions as gospel. If that happens, “the risk is that it will become harder and harder and harder to justify why you need to solve an experimental structure.” That could lead to reduced funding, talent and other resources for the types of experiments needed to check the computer’s work and forge new ground, he says.

Helps protect against bacterial infections.

Harvard Medical School’s Bouatta is more optimistic. He thinks that researchers probably don’t need to invest experimental resources in the types of proteins that AlphaFold does a good job of predicting, which should help structural biologists triage where to put their time and money.

“There are proteins for which AlphaFold is still struggling,” Bouatta agrees. Researchers should spend their capital there, he says. “Maybe if we generate more [experimental] data for those challenging proteins, we could use them for retraining another AI system” that could make even better predictions.

He and colleagues have already reverse engineered AlphaFold to make a version called OpenFold that researchers can train to solve other problems, such as those gnarly but important protein complexes.

Massive amounts of DNA generated by the Human Genome Project have made a wide range of biological discoveries possible and opened up new fields of research (SN: 2/12/22, p. 22). Having structural information on 200 million proteins could be similarly revolutionary, Bouatta says.

In the future, thanks to AlphaFold and its AI kin, he says, “we don’t even know what sorts of questions we might be asking.”

Questions or comments on this article? E-mail us at feedback@sciencenews.org

A version of this article appears in the September 24, 2022 issue of Science News.

DeepMind and EMBL-EBI. AlphaFold predicts structure of almost every catalogued protein known to science. Published July 28, 2022.

S. Mosalaganti et al. AI-based structure prediction empowers integrative structural analysis of human nuclear pores. Science. Vol. 376, June 10, 2022, p. 6598. doi:10.1126/science.abm9506.

K.-T. Ko et al. Structure of the malaria vaccine candidate Pfs48/45 and its recognition by transmission blocking antibodies. bioRxiv.org. May 25, 2022. doi:10.1101/2022.05.24.493318.

T.R. Alderson et al. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. bioRxiv.org. February 18, 2022.

J. Jumper et al. Highly accurate protein structure prediction with AlphaFold. Nature. Vol. 596, July 15, 2021, p. 583. doi:10.1038/s41586-021-03819-2.

M. Baek, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. Published online July 15, 2021. doi: 10.1126/science.abj8754.

Tina Hesman Saey is the senior staff writer and reports on molecular biology. She has a Ph.D. in molecular genetics from Washington University in St. Louis and a master’s degree in science journalism from Boston University.

Science News was founded in 1921 as an independent, nonprofit source of accurate information on the latest news of science, medicine and technology. Today, our mission remains the same: to empower people to evaluate the news and the world around them. It is published by the Society for Science, a nonprofit 501(c)(3) membership organization dedicated to public engagement in scientific research and education (EIN 53-0196483).

© Society for Science & the Public 2000–2022. All rights reserved.

Subscribers, enter your e-mail address for full access to the Science News archives and digital editions.

Not a subscriber? Become one now.